First will look at salaries over time.
Note that salary information is only available from 1985 forward.
min(salaries$yearID)
## [1] 1985
We can observe salary trends over time using player salary data from the Lahman dataset, and US household salary data from the US Census Bureau.
Using the respective 1985 salaries as a baseline, we see that US median household income has increased by around 25% every 5 years. Growth in median player salary has generally outpaced the salary growth of US households until the past decade. Average salary, in contrast, has increased at a greater rate since 1990. This suggests that salary trends for “top” players may be different from that of “average” players.
Observing the trends by percentile, we can see that, indeed, salaries have not increased at the same rate across the board, with large gains for the top paid players, with little change for players in the lower percentiles. We will look at a comparison of percentiles by team.
On a team-by-team basis, we can see that salary growth was not the same between the median and the 90th salary percentiles. In particular, there was a decreasing trend in median salary between 1991 and 1995, where the 90th percentile did not change. What may have caused this?
Using the same data to plot the [greater percentage the 90th percentile was receiving], we can see an increasing trend until 1995, the year after the baseball strike of 1994, then a decreasing trend afterwards. Again, percentiles were calculated by team.
Is this a real trend? Let’s check with a piecewise linear spline regression.
##
## Call:
## lm(formula = p90rat ~ yearID + yearID * (yearID > 1994.5), data = salteam)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4831 -2.6670 -0.7435 1.4938 21.7437
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.404e+03 1.629e+02 -8.620 <2e-16 ***
## yearID 7.081e-01 8.185e-02 8.652 <2e-16 ***
## yearID > 1994.5TRUE 1.528e+03 1.697e+02 9.005 <2e-16 ***
## yearID:yearID > 1994.5TRUE -7.664e-01 8.526e-02 -8.989 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.847 on 914 degrees of freedom
## Multiple R-squared: 0.1634, Adjusted R-squared: 0.1607
## F-statistic: 59.5 on 3 and 914 DF, p-value: < 2.2e-16
##
## ***Regression Model with Segmented Relationship(s)***
##
## Call:
## segmented.lm(obj = splfit2, seg.Z = ~yearID, psi = 1994)
##
## Estimated Break-Point(s):
## Est. St.Err
## 1994.396 0.723
##
## Meaningful coefficients of the linear terms:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.404e+03 1.629e+02 -8.620 <2e-16 ***
## yearID 7.081e-01 8.185e-02 8.652 <2e-16 ***
## U1.yearID -7.664e-01 8.526e-02 -8.989 NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.847 on 914 degrees of freedom
## Multiple R-Squared: 0.1634, Adjusted R-squared: 0.1607
##
## Convergence attained in 2 iterations with relative change 0
Looks legit. Modeling a discontinuous regression also illustrates this break.
##
## Call:
## RDestimate(formula = p90rat ~ yearID, data = salteam, cutpoint = 1994.5)
##
## Type:
## sharp
##
## Estimates:
## Bandwidth Observations Estimate Std. Error z value
## LATE 6.049 334 2.872 1.0894 2.636
## Half-BW 3.024 166 4.335 1.6445 2.636
## Double-BW 12.097 618 1.708 0.7535 2.266
## Pr(>|z|)
## LATE 0.008394 **
## Half-BW 0.008395 **
## Double-BW 0.023436 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## F-statistics:
## F Num. DoF Denom. DoF p
## LATE 34.93 3 330 0.000e+00
## Half-BW 11.26 3 162 1.901e-06
## Double-BW 74.99 3 614 0.000e+00
#plot rdd fit
plot(rddfit)
In 1994, two-pronged plan to limit the effect of reduced revenues. First was to enact revenue-sharing in order to keep smaller teams from having to drop out. The other was a salary cap, to limit the labor cost of players. Though the salary thing maybe didn’t go through, revenue sharing did. Was it effective?
Let’s look at how team/franchise revenue has changed over time. Data comes from Michael Ozanian, a writer for Forbes, who has been compiling annual financial data for Major League Baseball since 1990. Numbers have been adjusted to 1990 dollars using the Consumer Price Index (CPI).
We see that there is a drop in revenues in the year of the stike, but this recovers after a few years. Note that the 2012 revenue data is incomplete and skewed upwards. What about franchise values?
We can see that the strike did not drastically affect franchise values. Soon after the strike, as revenues recovered, valuations rose greatly after the revenue sharing agreement, and again after 2010 (note that the 2012 valuation data is complete). Let’s look back at revenues.
We can see that teams mostly have increasing revenues year-on-year, with the biggest increases in 1997 and 2014. That’s good, but the big point of the revenue sharing agreement was to prevent losses. Was this accomplished?
regress/anova by cba?
##
## Call:
## lm(formula = teams ~ Year, data = oloss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28988 -0.10506 0.01310 0.05647 0.29085
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.710471 7.714400 3.851 0.000767 ***
## Year -0.014688 0.003852 -3.813 0.000845 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1473 on 24 degrees of freedom
## Multiple R-squared: 0.3772, Adjusted R-squared: 0.3513
## F-statistic: 14.54 on 1 and 24 DF, p-value: 0.0008448
##
## Call:
## lm(formula = teams ~ Year, data = deval)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.279949 -0.084264 -0.006249 0.103421 0.230952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.617231 8.244450 3.956 0.000627 ***
## Year -0.016183 0.004116 -3.932 0.000666 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1484 on 23 degrees of freedom
## Multiple R-squared: 0.402, Adjusted R-squared: 0.376
## F-statistic: 15.46 on 1 and 23 DF, p-value: 0.0006664
Can see that there is a decreasing trend, with the percentage of teams decreasing by ~1.5% … teams losing value or with operating loss with every year, for a roughly 30% decrease over 20 years. (percentage points?)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cba) 4 0.3586 0.08966 6.016 0.0024 **
## Residuals 20 0.2981 0.01490
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = teams ~ as.factor(cba), data = oloss)
##
## $`as.factor(cba)`
## diff lwr upr p adj
## 8-7 0.04963370 -0.1740707 0.27333810 0.9618847
## 9-7 -0.09322344 -0.3515350 0.16508815 0.8145533
## 10-7 -0.27655678 -0.5348684 -0.01824518 0.0322181
## 11-7 -0.17322344 -0.4182793 0.07183245 0.2523730
## 9-8 -0.14285714 -0.3665615 0.08084726 0.3437435
## 10-8 -0.32619048 -0.5498949 -0.10248607 0.0024846
## 11-8 -0.22285714 -0.4311146 -0.01459968 0.0323258
## 10-9 -0.18333333 -0.4416449 0.07497826 0.2489516
## 11-9 -0.08000000 -0.3250559 0.16505589 0.8624128
## 11-10 0.10333333 -0.1417226 0.34838923 0.7164209
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cba) 4 0.4607 0.11519 6.28 0.00214 **
## Residuals 19 0.3485 0.01834
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = teams ~ as.factor(cba), data = deval)
##
## $`as.factor(cba)`
## diff lwr upr p adj
## 8-7 -0.31607143 -0.5917918 -0.04035104 0.0200669
## 9-7 -0.40714286 -0.7181974 -0.09608831 0.0069843
## 10-7 -0.34047619 -0.6515307 -0.02942164 0.0278618
## 11-7 -0.47714286 -0.7745679 -0.17971782 0.0009916
## 9-8 -0.09107143 -0.3404699 0.15832705 0.8053502
## 10-8 -0.02440476 -0.2738032 0.22499371 0.9982114
## 11-8 -0.16107143 -0.3932488 0.07110592 0.2660313
## 10-9 0.06666667 -0.2213139 0.35464722 0.9548525
## 11-9 -0.07000000 -0.3432023 0.20320234 0.9360068
## 11-10 -0.13666667 -0.4098690 0.13653568 0.5721522
We can see that the measures taken in CBAs since the strike have reduced and kept down the number of teams losing value.
Measures seem to prevent teams from losing value year-on-year.
https://www.sbnation.com/2010/8/30/1065675/8-30-2002-baseball-avoids-another 2002 - new collective bargaining agreement that increasing rev sharing 70% and added luxury tax
https://www.fangraphs.com/tht/a-history-of-the-collective-bargaining-agreement-part-3/ ^good
So it seems like the measures accomplish the goal of keeping teams on good financial footing. Let’s see how they affect team composition.
totsals <- salteam %>% group_by(yearID) %>% mutate(mutot = mean(totsal),sdtot = sd(totsal))
totsals <- totsals %>% mutate(stdtot = (totsal-mutot)/sdtot)
totsals %>% ggplot() + geom_point(aes(yearID,stdtot,group=yearID),size=1) +
geom_smooth(aes(yearID,stdtot))
## `geom_smooth()` using method = 'loess'
revsals <- revs %>% rename(yearID=Year)
revsals <- left_join(revsals,names,by=c("yearID","franchID"))
revsals <- left_join(revsals,totsals,by=c("yearID","teamID"))
revsals <- revsals %>% mutate(pctrev = totsal/(Revenue*1000000))
revsals %>% ggplot() + geom_boxplot(aes(yearID,pctrev,fill=factor(yearID))) +
geom_vline(aes(xintercept=as.numeric(1994)),linetype=1,size=7,alpha=0.5, color="red") +
geom_text(aes(x=1994,y=0.0),label="STRIKE",angle=90,hjust = 0,color="white") +
labs(title="Percent of revenue spent on player payroll",x="Year",y="Percentage of revenue ($, millions)") +
scale_x_continuous(breaks = seq(1990,2015,5)) +theme(legend.position="none")
## Warning: Removed 21 rows containing non-finite values (stat_boxplot).
WAR is a stat and blah blah blah
add vline, change palette
warteam %>% ggplot() + geom_tile(aes(x=yearID,y=factor(franfct),fill=medWAR)) +
scale_x_continuous(breaks = seq(1985,2015,5)) +
#scale_y_discrete(labels = rev(allwar$franchID)) +
scale_fill_gradient(low="#00007F", high="red", name="WAR") +
geom_vline(aes(xintercept=1994),linetype=3,size=1, color="white") +
labs(title="Median WAR by team",x="Year",y="Team/Franchise")
The teams with the highest WAR prior to 2000
warteam %>% ggplot() + geom_tile(aes(x=yearID,y=factor(franfct),fill=wpct)) +
scale_x_continuous(breaks = seq(1985,2015,5)) +
#scale_y_discrete(labels = rev(allwar$franchID)) +
scale_fill_gradient(low="#00007F", high="red", name="Win\nPercentage") +
geom_vline(aes(xintercept=1994),linetype=3,size=1, color="white") +
labs(title="Win percentage by team",x="Year",y="Team/Franchise")
WL ratio and rank were not affected, random as always.
What kind of players were paid more?
[1995]
for (i in 1:nrow(revs)) {
revs$TID[[i]] = names$teamID[which(names$yearID==revs$Year[[i]] & names$franchID==revs$franchID[[i]])]
}
for (i in 1:nrow(salmax95)) {
salmax95$teamname[[i]] <- revs$Team[which(revs$Year==1995 & revs$TID==salmax95$teamID[[i]])]
}
salmax95$teamname <- as.character(salmax95$teamname)
salmax95 <- salmax95 %>% mutate(salchr = paste("$",round(salary/100000)/10,sep=""))
salmax95 %>% select(fullname,teamname,salchr) %>% rename(Name=fullname,Team=teamname,Salary=salchr) %>% arrange(desc(Salary))
## Name Team Salary
## 1 Cecil Fielder Tigers $9.2
## 2 Barry Bonds Giants $8.2
## 3 David Cone Blue Jays $8
## 4 Ken Griffey Mariners $7.6
## 5 Frank Thomas White Sox $7.2
## 6 Jeff Bagwell Astros $6.9
## 7 Mark McGwire Athletics $6.9
## 8 Cal Ripken Orioles $6.7
## 9 Greg Maddux Braves $6.5
## 10 Kirby Puckett Twins $6.3
## 11 Lenny Dykstra Phillies $6.2
## 12 Barry Larkin Reds $5.9
## 13 Jose Canseco Red Sox $5.8
## 14 Bret Saberhagen Mets $5.6
## 15 Gary Sheffield Marlins $5.6
## 16 Will Clark Rangers $5.6
## 17 Jack McDowell Yankees $5.4
## 18 Darryl Strawberry Dodgers $5.3
## 19 Mark Langston Angels $5
## 20 Larry Walker Rockies $5
## 21 Greg Vaughn Brewers $4.9
## 22 Wally Joyner Royals $4.8
## 23 Tony Gwynn Padres $4.7
## 24 Dennis Martinez Indians $4.6
## 25 Ken Hill Cardinals $4.5
## 26 Jay Bell Pirates $4.4
## 27 Mark Grace Cubs $4.4
## 28 Moises Alou Expos $3
Max salaries in a team went to future Hall of Famers Ken Griffey Jr., Barry Larkin, Kirby Puckett, and Cal Ripken, Jr., as well as other popular players of the time, including Barry Bonds, Jose Canseco, Mark McGwire, and Daryl Strawberry. So, it seems that salaries went to popular players, who were presumably also the players who played well.
Do better players make more?
#Previously calculated salary changes and performance changes year-to-year and saved in warplus.csv
warstats <- wars %>% filter(pitcher=="N")
#warstats <- wars
warstats <- warstats[c("playerID","yearID","teamID",
"WAA","WAA_off","WAA_def",
"WAR","WAR_off","WAR_def",
"prevWAA","prevWAA_off","prevWAA_def",
"prevWAR","prevWAR_off","prevWAR_def",
"dWAA","dWAA_off","dWAA_def",
"dWAR","dWAR_off","dWAR_def",
"OPS_plus","salary","bump","cut","dsal","pdsal")]
warstats <- warstats %>% filter(yearID > 1985)
warstats <- warstats %>% filter(dsal!=0)
warstats <- warstats %>% filter(abs(dsal)>1)
warcut <- warstats %>% filter(dsal<0)
warcut %>% ggplot() + geom_histogram(aes(as.numeric(prevWAA_off)),binwidth=0.05)
warbump <- warstats %>% filter(dsal>0)
warbump %>% ggplot() + geom_histogram(aes(as.numeric(prevWAA_off)),binwidth=0.05)
Doesn’t say much. Let’s look at other trends.
warstats$pdsal[which(warstats$pdsal==Inf)] = NA
warstats %>% filter(pdsal < 40, dsal > -100000) %>% ggplot() +
geom_point(aes(prevWAA_off,dsal + abs(min(dsal))+1 )) +
geom_smooth(aes(prevWAA_off,dsal + abs(min(dsal))+1 )) +
scale_y_continuous(trans="log10")
## `geom_smooth()` using method = 'gam'
warstats %>% ggplot() +
geom_point(aes(prevWAA_off,dsal/1000000)) +
geom_smooth(aes(prevWAA_off,dsal/1000000)) +
scale_y_continuous(breaks=seq(-30,30,5)) +
labs(x="WAA in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'gam'
warstats %>% ggplot() +
geom_point(aes(prevWAR_off,dsal/1000000)) +
geom_smooth(aes(prevWAR_off,dsal/1000000)) +
scale_y_continuous(breaks=seq(-30,30,5)) +
labs(x="WAR in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'gam'
# warbump %>% filter(pdsal < 10000) %>% ggplot() +
# scale_y_continuous(trans="log10") +
# geom_point(aes(as.numeric(prevWAA_off),pdsal))
# warcut %>% filter(pdsal < 10000) %>% ggplot() +
# geom_point(aes(as.numeric(prevWAA),pdsal))
Again, doesn’t look like much. Let’s look at the regression.
What about for a single year?
warstats %>% filter(yearID==1991) %>% ggplot() +
geom_point(aes(prevWAR,salary/1000000)) +
geom_smooth(aes(prevWAR,salary/1000000)) +
scale_y_continuous(breaks=seq(-30,30,5),trans="log10") +
labs(x="WAR in previous year", y="Salary change ($, millions)")
## `geom_smooth()` using method = 'loess'
## Warning in self$trans$transform(breaks): NaNs produced
w2 <- warstats %>% filter(yearID==1991)
#get est and r^2 by year?
summary(lm(w2$dsal~w2$prevWAA_off))
##
## Call:
## lm(formula = w2$dsal ~ w2$prevWAA_off)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1242335 -267712 -116604 142628 2142901
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 324225 23463 13.819 <2e-16 ***
## w2$prevWAA_off 165917 18256 9.088 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 435800 on 349 degrees of freedom
## Multiple R-squared: 0.1914, Adjusted R-squared: 0.1891
## F-statistic: 82.59 on 1 and 349 DF, p-value: < 2.2e-16
summary(lm(w2$dsal~w2$prevWAR_off))
##
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR_off)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1342358 -184112 -66506 103337 1976559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 133148 27485 4.844 1.91e-06 ***
## w2$prevWAR_off 164169 12931 12.696 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 400800 on 349 degrees of freedom
## Multiple R-squared: 0.3159, Adjusted R-squared: 0.314
## F-statistic: 161.2 on 1 and 349 DF, p-value: < 2.2e-16
summary(lm(w2$dsal~w2$prevWAR))
##
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1318664 -198423 -80652 88553 2004829
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 161395 26878 6.005 4.81e-09 ***
## w2$prevWAR 135519 11234 12.063 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 407100 on 349 degrees of freedom
## Multiple R-squared: 0.2943, Adjusted R-squared: 0.2922
## F-statistic: 145.5 on 1 and 349 DF, p-value: < 2.2e-16
summary(lm(w2$dsal ~ w2$prevWAR + w2$dWAR) )
##
## Call:
## lm(formula = w2$dsal ~ w2$prevWAR + w2$dWAR)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1273967 -183512 -75252 103165 1839214
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 145008 27543 5.265 2.46e-07 ***
## w2$prevWAR 149332 12539 11.910 < 2e-16 ***
## w2$dWAR 31980 13246 2.414 0.0163 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 404300 on 348 degrees of freedom
## Multiple R-squared: 0.3059, Adjusted R-squared: 0.3019
## F-statistic: 76.68 on 2 and 348 DF, p-value: < 2.2e-16
Sure enough, doesn’t look like much. But what if it changes over time?
addcba <- function(df){
df$cba = .bincode(df$yearID,c(0,1993.5,1994.5,2002,2006,2010,10000))
df$cba = ifelse(df$cba>2,df$cba+5,df$cba)
df$cba = ifelse(df$cba==1,df$cba+6,df$cba)
df$cba = ifelse(df$cba==2,NA,df$cba)
df
}
regres <- addcba(regres)
anova.reg <- aov(adj.r.squared ~ as.factor(cba),data=regres)
summary(anova.reg)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cba) 4 0.22894 0.05724 36.4 4.39e-10 ***
## Residuals 25 0.03932 0.00157
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
TukeyHSD(anova.reg)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = adj.r.squared ~ as.factor(cba), data = regres)
##
## $`as.factor(cba)`
## diff lwr upr p adj
## 8-7 0.06186162 0.003628854 0.120094394 0.0333907
## 9-7 -0.01380545 -0.085125738 0.057514835 0.9784690
## 10-7 -0.13789710 -0.209217387 -0.066576814 0.0000596
## 11-7 -0.16410402 -0.227002599 -0.101205438 0.0000005
## 9-8 -0.07566708 -0.146987362 -0.004346789 0.0336931
## 10-8 -0.19975872 -0.271079011 -0.128438438 0.0000001
## 11-8 -0.22596564 -0.288864223 -0.163067062 0.0000000
## 10-9 -0.12409165 -0.206445222 -0.041738075 0.0014244
## 11-9 -0.15029857 -0.225476750 -0.075120384 0.0000367
## 11-10 -0.02620692 -0.101385102 0.048971265 0.8420085
regres <- addcba(regres)
anova.reg <- aov(adj.r.squared ~ as.factor(cba),data=regres)
summary(anova.reg)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cba) 4 0.14687 0.03672 33.12 1.18e-09 ***
## Residuals 25 0.02772 0.00111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
TukeyHSD(anova.reg)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = adj.r.squared ~ as.factor(cba), data = regres)
##
## $`as.factor(cba)`
## diff lwr upr p adj
## 8-7 0.1071798491 0.05828624 0.15607346 0.0000090
## 9-7 0.0114611309 -0.04842107 0.07134333 0.9793432
## 10-7 -0.0748273792 -0.13470958 -0.01494518 0.0092462
## 11-7 -0.0740462640 -0.12685740 -0.02123513 0.0030796
## 9-8 -0.0957187182 -0.15560092 -0.03583652 0.0007211
## 10-8 -0.1820072283 -0.24188943 -0.12212503 0.0000000
## 11-8 -0.1812261131 -0.23403725 -0.12841497 0.0000000
## 10-9 -0.0862885101 -0.15543452 -0.01714250 0.0093555
## 11-9 -0.0855073949 -0.14862878 -0.02238601 0.0043512
## 11-10 0.0007811152 -0.06234027 0.06390250 0.9999996
regres %>% ggplot() + geom_bar(aes(cba,adj.r.squared),stat="identity")# + geom_errorbar()
## Warning: Removed 1 rows containing missing values (position_stack).
Looks like this may have changed along with the collective bargaining agreements. May have had to do with luxury tax. Or maybe steroid era.
salteam2 <- left_join(salteam,names,by=c("teamID","yearID"))
salteam2 <- salteam2 %>% mutate(wlratio = W/L, wpct = W/(W+L))
salteam3 <- salteam2 %>% group_by(yearID) %>%
mutate(muavgy = mean(avgsal),sdavgy = sd(avgsal),
mutoty = mean(totsal),sdtoty = sd(totsal),
wpd = totsal/W)
cpi2 <- cpi %>% rename(yearID = year)
salteam4 <- merge(salteam3,cpi2,by="yearID")
salteam4 <- salteam4 %>% mutate(wpd1990 = wpd/v1900)
wlres <- salteam2 %>% group_by(yearID) %>% do(tidy(lm(wlratio ~ totsal,data=.)))
wlres <- salteam2 %>% group_by(yearID) %>% do(glance(lm(wlratio ~ p50sal,data=.)))
wlres
## # A tibble: 32 x 12
## # Groups: yearID [32]
## yearID r.squared adj.r.squared sigma statistic p.value df
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1985 0.010728380 -0.03049127 0.3244773 0.2602734 0.614592541 2
## 2 1986 0.013038897 -0.02808448 0.2977107 0.3170677 0.578598308 2
## 3 1987 0.006414849 -0.03498453 0.2541052 0.1549504 0.697324192 2
## 4 1988 0.099804182 0.06229602 0.2930442 2.6608659 0.115899337 2
## 5 1989 0.241350125 0.20973971 0.2165099 7.6351466 0.010813630 2
## 6 1990 0.010524416 -0.03070373 0.2503326 0.2552726 0.617995237 2
## 7 1991 0.061271579 0.02215790 0.2352984 1.5664998 0.222778135 2
## 8 1992 0.007498910 -0.03385530 0.2694470 0.1813336 0.674024235 2
## 9 1993 0.235007108 0.20558430 0.2875715 7.9872439 0.008936571 2
## 10 1994 0.143842252 0.11091311 0.2954301 4.3682354 0.046540300 2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## # BIC <dbl>, deviance <dbl>, df.residual <int>
#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
wlres <- tmpsal %>% lm(wpct ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wpct)) + geom_smooth(aes(stdsal,wpct))
print(myplot)
}
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.170968 -0.041127 0.003113 0.043322 0.154448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499950 0.006221 80.363 <2e-16 ***
## stdsal 0.011004 0.006344 1.734 0.0859 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06344 on 102 degrees of freedom
## Multiple R-squared: 0.02865, Adjusted R-squared: 0.01913
## F-statistic: 3.008 on 1 and 102 DF, p-value: 0.08585
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.14919 -0.05129 -0.01026 0.05134 0.17869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.500060 0.006151 81.299 < 2e-16 ***
## stdsal 0.017996 0.006268 2.871 0.00494 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06392 on 106 degrees of freedom
## Multiple R-squared: 0.07215, Adjusted R-squared: 0.0634
## F-statistic: 8.243 on 1 and 106 DF, p-value: 0.00494
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.16329 -0.04518 0.00182 0.03722 0.14106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499952 0.005431 92.05 < 2e-16 ***
## stdsal 0.039849 0.005527 7.21 6.57e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0585 on 114 degrees of freedom
## Multiple R-squared: 0.3132, Adjusted R-squared: 0.3071
## F-statistic: 51.98 on 1 and 114 DF, p-value: 6.57e-11
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20606 -0.05451 0.01146 0.05256 0.20215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.500000 0.006899 72.477 < 2e-16 ***
## stdsal 0.036670 0.007017 5.226 7.57e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07557 on 118 degrees of freedom
## Multiple R-squared: 0.188, Adjusted R-squared: 0.1811
## F-statistic: 27.31 on 1 and 118 DF, p-value: 7.566e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.145329 -0.041975 0.002131 0.041144 0.134287
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499977 0.005218 95.812 < 2e-16 ***
## stdsal 0.029380 0.005308 5.536 1.9e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05716 on 118 degrees of freedom
## Multiple R-squared: 0.2061, Adjusted R-squared: 0.1994
## F-statistic: 30.64 on 1 and 118 DF, p-value: 1.904e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.149096 -0.038805 0.007312 0.052624 0.114660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499994 0.006038 82.801 < 2e-16 ***
## stdsal 0.021162 0.006142 3.446 0.00079 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06615 on 118 degrees of freedom
## Multiple R-squared: 0.09141, Adjusted R-squared: 0.08371
## F-statistic: 11.87 on 1 and 118 DF, p-value: 0.0007899
## `geom_smooth()` using method = 'loess'
#5 year standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
wlres <- tmpsal %>% lm(wpct ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wpct)) + geom_smooth(aes(stdsal,wpct))
print(myplot)
}
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.172902 -0.040316 0.003635 0.042368 0.154907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499829 0.006214 80.440 <2e-16 ***
## stdsal 0.010701 0.005914 1.809 0.0734 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06336 on 102 degrees of freedom
## Multiple R-squared: 0.03109, Adjusted R-squared: 0.0216
## F-statistic: 3.273 on 1 and 102 DF, p-value: 0.07335
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.148902 -0.047993 0.000505 0.046672 0.163294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.498574 0.005951 83.773 < 2e-16 ***
## stdsal 0.019611 0.004829 4.061 9.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06173 on 106 degrees of freedom
## Multiple R-squared: 0.1346, Adjusted R-squared: 0.1265
## F-statistic: 16.49 on 1 and 106 DF, p-value: 9.39e-05
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.15143 -0.04235 0.00053 0.04179 0.16215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.496562 0.005639 88.054 < 2e-16 ***
## stdsal 0.028600 0.004450 6.427 3.13e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06047 on 114 degrees of freedom
## Multiple R-squared: 0.266, Adjusted R-squared: 0.2595
## F-statistic: 41.31 on 1 and 114 DF, p-value: 3.129e-09
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20521 -0.05467 0.01017 0.05316 0.20306
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499105 0.006890 72.44 < 2e-16 ***
## stdsal 0.036616 0.006948 5.27 6.24e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07545 on 118 degrees of freedom
## Multiple R-squared: 0.1905, Adjusted R-squared: 0.1837
## F-statistic: 27.77 on 1 and 118 DF, p-value: 6.242e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.146966 -0.040119 0.005833 0.039018 0.133624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499374 0.005238 95.334 < 2e-16 ***
## stdsal 0.028332 0.005209 5.439 2.94e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05737 on 118 degrees of freedom
## Multiple R-squared: 0.2005, Adjusted R-squared: 0.1937
## F-statistic: 29.59 on 1 and 118 DF, p-value: 2.938e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wpct ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.153625 -0.041106 0.005691 0.053141 0.114145
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499030 0.006087 81.988 < 2e-16 ***
## stdsal 0.017940 0.005633 3.185 0.00185 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06659 on 118 degrees of freedom
## Multiple R-squared: 0.07916, Adjusted R-squared: 0.07136
## F-statistic: 10.14 on 1 and 118 DF, p-value: 0.001852
## `geom_smooth()` using method = 'loess'
#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
wlres <- tmpsal %>% lm(Rank ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,Rank)) + geom_smooth(aes(stdsal,Rank))
print(myplot)
}
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0924 -1.6187 0.0335 1.6093 3.4680
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7308 0.1824 20.456 <2e-16 ***
## stdsal -0.3406 0.1860 -1.831 0.07 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.86 on 102 degrees of freedom
## Multiple R-squared: 0.03183, Adjusted R-squared: 0.02234
## F-statistic: 3.354 on 1 and 102 DF, p-value: 0.06997
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1463 -1.4999 -0.3132 1.5823 3.9737
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5370 0.1745 20.269 <2e-16 ***
## stdsal -0.3701 0.1778 -2.081 0.0398 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.813 on 106 degrees of freedom
## Multiple R-squared: 0.03926, Adjusted R-squared: 0.0302
## F-statistic: 4.332 on 1 and 106 DF, p-value: 0.03981
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4587 -1.0231 -0.0375 1.0505 3.4596
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.9483 0.1178 25.024 < 2e-16 ***
## stdsal -0.6690 0.1199 -5.579 1.65e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.269 on 114 degrees of freedom
## Multiple R-squared: 0.2145, Adjusted R-squared: 0.2076
## F-statistic: 31.13 on 1 and 114 DF, p-value: 1.653e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7496 -1.1089 0.1419 1.0274 3.0226
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0333 0.1223 24.804 < 2e-16 ***
## stdsal -0.6439 0.1244 -5.177 9.38e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.34 on 118 degrees of freedom
## Multiple R-squared: 0.1851, Adjusted R-squared: 0.1782
## F-statistic: 26.8 on 1 and 118 DF, p-value: 9.376e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7036 -1.1284 -0.0261 1.1088 3.2836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0167 0.1236 24.398 < 2e-16 ***
## stdsal -0.5685 0.1258 -4.521 1.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.354 on 118 degrees of freedom
## Multiple R-squared: 0.1476, Adjusted R-squared: 0.1404
## F-statistic: 20.44 on 1 and 118 DF, p-value: 1.474e-05
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4496 -1.1248 -0.1485 0.9314 2.7751
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0250 0.1289 23.474 < 2e-16 ***
## stdsal -0.3688 0.1311 -2.814 0.00574 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.412 on 118 degrees of freedom
## Multiple R-squared: 0.06288, Adjusted R-squared: 0.05493
## F-statistic: 7.917 on 1 and 118 DF, p-value: 0.005739
## `geom_smooth()` using method = 'loess'
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
wlres <- tmpsal %>% lm(Rank ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,Rank)) + geom_smooth(aes(stdsal,Rank))
print(myplot)
}
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2970 -1.6387 -0.0339 1.6072 3.6072
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7345 0.1822 20.500 <2e-16 ***
## stdsal -0.3294 0.1734 -1.899 0.0603 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.858 on 102 degrees of freedom
## Multiple R-squared: 0.03416, Adjusted R-squared: 0.02469
## F-statistic: 3.608 on 1 and 102 DF, p-value: 0.06033
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1544 -1.4275 -0.1564 1.2422 4.0194
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5734 0.1689 21.161 < 2e-16 ***
## stdsal -0.4798 0.1370 -3.502 0.000678 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.752 on 106 degrees of freedom
## Multiple R-squared: 0.1037, Adjusted R-squared: 0.09523
## F-statistic: 12.26 on 1 and 106 DF, p-value: 0.0006778
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4576 -0.9606 -0.0107 0.9254 3.6188
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.00601 0.12036 24.976 < 2e-16 ***
## stdsal -0.48703 0.09497 -5.128 1.21e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.291 on 114 degrees of freedom
## Multiple R-squared: 0.1874, Adjusted R-squared: 0.1803
## F-statistic: 26.3 on 1 and 114 DF, p-value: 1.21e-06
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7160 -1.1112 0.1837 0.9506 2.8982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0492 0.1218 25.027 < 2e-16 ***
## stdsal -0.6499 0.1229 -5.289 5.73e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.334 on 118 degrees of freedom
## Multiple R-squared: 0.1917, Adjusted R-squared: 0.1848
## F-statistic: 27.98 on 1 and 118 DF, p-value: 5.729e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.68509 -1.13595 0.01056 1.02661 3.16204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0282 0.1241 24.394 < 2e-16 ***
## stdsal -0.5436 0.1234 -4.404 2.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.36 on 118 degrees of freedom
## Multiple R-squared: 0.1412, Adjusted R-squared: 0.1339
## F-statistic: 19.39 on 1 and 118 DF, p-value: 2.351e-05
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = Rank ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4240 -1.1755 -0.1448 0.9357 2.6941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0421 0.1295 23.496 < 2e-16 ***
## stdsal -0.3180 0.1198 -2.654 0.00905 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.417 on 118 degrees of freedom
## Multiple R-squared: 0.05633, Adjusted R-squared: 0.04834
## F-statistic: 7.044 on 1 and 118 DF, p-value: 0.009048
## `geom_smooth()` using method = 'loess'
#yearly standard
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam3 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
wlres <- tmpsal %>% lm(wlratio ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wlratio)) + geom_smooth(aes(stdsal,wlratio))
print(myplot)
}
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.55857 -0.17972 -0.01983 0.15692 0.90961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03342 0.02611 39.586 <2e-16 ***
## stdsal 0.05109 0.02662 1.919 0.0578 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2662 on 102 degrees of freedom
## Multiple R-squared: 0.03485, Adjusted R-squared: 0.02539
## F-statistic: 3.683 on 1 and 102 DF, p-value: 0.05777
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51801 -0.21742 -0.07618 0.19493 0.93268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03698 0.02658 39.011 < 2e-16 ***
## stdsal 0.07270 0.02709 2.684 0.00845 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2762 on 106 degrees of freedom
## Multiple R-squared: 0.06362, Adjusted R-squared: 0.05479
## F-statistic: 7.202 on 1 and 106 DF, p-value: 0.00845
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52790 -0.17900 -0.02483 0.14699 1.05555
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.04262 0.02438 42.758 < 2e-16 ***
## stdsal 0.17596 0.02482 7.091 1.19e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2626 on 114 degrees of freedom
## Multiple R-squared: 0.3061, Adjusted R-squared: 0.3
## F-statistic: 50.28 on 1 and 114 DF, p-value: 1.193e-10
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62279 -0.21269 -0.00028 0.18769 1.40373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.05802 0.02965 35.684 < 2e-16 ***
## stdsal 0.15825 0.03016 5.248 6.88e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3248 on 118 degrees of freedom
## Multiple R-squared: 0.1892, Adjusted R-squared: 0.1823
## F-statistic: 27.54 on 1 and 118 DF, p-value: 6.883e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52235 -0.15494 -0.01333 0.16523 0.61128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03274 0.02112 48.897 < 2e-16 ***
## stdsal 0.12556 0.02148 5.845 4.6e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2314 on 118 degrees of freedom
## Multiple R-squared: 0.2245, Adjusted R-squared: 0.2179
## F-statistic: 34.16 on 1 and 118 DF, p-value: 4.596e-08
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4709 -0.1866 -0.0040 0.2036 0.5325
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03796 0.02447 42.412 < 2e-16 ***
## stdsal 0.08536 0.02489 3.429 0.000834 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2681 on 118 degrees of freedom
## Multiple R-squared: 0.09063, Adjusted R-squared: 0.08292
## F-statistic: 11.76 on 1 and 118 DF, p-value: 0.0008343
## `geom_smooth()` using method = 'loess'
date1 = seq(1985,2010,5)
date2 = seq(1989,2014,5)
dates = data.frame(date1,date2)
for(i in 1:6){
tmpsal <- salteam2 %>% filter(yearID > dates$date1[[i]] & yearID <= dates$date2[[i]])
meansdsal <- tmpsal %>% summarize(meansal = mean(totsal), sdsal = sd(totsal))
tmpsal <- tmpsal %>% mutate(stdsal = (totsal - meansdsal$meansal)/meansdsal$sdsal)
wlres <- tmpsal %>% lm(wlratio ~ stdsal,data=.)
print(summary(wlres))
myplot <- tmpsal %>% ggplot() + geom_point(aes(stdsal,wlratio)) + geom_smooth(aes(stdsal,wlratio))
print(myplot)
}
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.56787 -0.17878 -0.00164 0.15248 0.91129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03286 0.02606 39.64 <2e-16 ***
## stdsal 0.05010 0.02480 2.02 0.046 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2657 on 102 degrees of freedom
## Multiple R-squared: 0.03845, Adjusted R-squared: 0.02903
## F-statistic: 4.079 on 1 and 102 DF, p-value: 0.04604
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5206 -0.2016 -0.0310 0.1691 0.8746
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03055 0.02558 40.287 < 2e-16 ***
## stdsal 0.08483 0.02076 4.087 8.53e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2653 on 106 degrees of freedom
## Multiple R-squared: 0.1361, Adjusted R-squared: 0.128
## F-statistic: 16.7 on 1 and 106 DF, p-value: 8.53e-05
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51892 -0.19926 -0.03073 0.14437 1.14747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.02755 0.02523 40.721 < 2e-16 ***
## stdsal 0.12711 0.01991 6.384 3.86e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2706 on 114 degrees of freedom
## Multiple R-squared: 0.2634, Adjusted R-squared: 0.2569
## F-statistic: 40.76 on 1 and 114 DF, p-value: 3.856e-09
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60950 -0.23063 -0.00783 0.17570 1.40764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.05416 0.02961 35.606 < 2e-16 ***
## stdsal 0.15815 0.02986 5.297 5.54e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3242 on 118 degrees of freedom
## Multiple R-squared: 0.1921, Adjusted R-squared: 0.1853
## F-statistic: 28.06 on 1 and 118 DF, p-value: 5.536e-07
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5298 -0.1670 -0.0036 0.1459 0.6091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03015 0.02118 48.628 < 2e-16 ***
## stdsal 0.12160 0.02107 5.773 6.43e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.232 on 118 degrees of freedom
## Multiple R-squared: 0.2202, Adjusted R-squared: 0.2136
## F-statistic: 33.32 on 1 and 118 DF, p-value: 6.434e-08
## `geom_smooth()` using method = 'loess'
##
## Call:
## lm(formula = wlratio ~ stdsal, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46046 -0.18495 -0.01397 0.20660 0.53015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03410 0.02468 41.899 < 2e-16 ***
## stdsal 0.07188 0.02284 3.147 0.00209 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.27 on 118 degrees of freedom
## Multiple R-squared: 0.07744, Adjusted R-squared: 0.06962
## F-statistic: 9.904 on 1 and 118 DF, p-value: 0.002088
## `geom_smooth()` using method = 'loess'
tmpsal <- salteam3 %>% mutate(stdsal = (totsal - mutoty)/sdtoty)
reg_totsal <- tmpsal %>% group_by(yearID) %>% do(glance(lm(wpct ~ stdsal,data=.)))
reg_totsal
## # A tibble: 32 x 12
## # Groups: yearID [32]
## yearID r.squared adj.r.squared sigma statistic p.value
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1985 1.179438e-01 0.081191444 0.07456162 3.209150e+00 0.08584779
## 2 1986 4.768197e-02 0.008002049 0.06298925 1.201665e+00 0.28386748
## 3 1987 2.671143e-03 -0.038884226 0.06175368 6.427913e-02 0.80201543
## 4 1988 2.656597e-02 -0.013993778 0.07475266 6.549836e-01 0.42628770
## 5 1989 1.210865e-01 0.084465061 0.05840066 3.306440e+00 0.08151193
## 6 1990 1.013029e-06 -0.041665611 0.05701009 2.431271e-05 0.99610657
## 7 1991 5.445002e-02 0.015052109 0.05925664 1.382053e+00 0.25128263
## 8 1992 1.068086e-03 -0.040554077 0.06433800 2.566146e-02 0.87407052
## 9 1993 1.243558e-01 0.090677203 0.07153848 3.692426e+00 0.06568576
## 10 1994 1.682955e-01 0.136306831 0.06355227 5.261102e+00 0.03013797
## # ... with 22 more rows, and 6 more variables: df <int>, logLik <dbl>,
## # AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal <- tmpsal %>% group_by(yearID) %>% do(glance(lm(wpct ~ stdsal,data=.)))
reg_avgsal
## # A tibble: 32 x 12
## # Groups: yearID [32]
## yearID r.squared adj.r.squared sigma statistic p.value df
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1985 0.043305297 0.003443018 0.07765222 1.08637282 0.30766239 2
## 2 1986 0.076376525 0.037892214 0.06203302 1.98461457 0.17172974 2
## 3 1987 0.001131412 -0.040488112 0.06180133 0.02718465 0.87042113 2
## 4 1988 0.074519865 0.035958193 0.07288815 1.93248532 0.17724633 2
## 5 1989 0.178082790 0.143836240 0.05647533 5.20002127 0.03176086 2
## 6 1990 0.000655151 -0.040984218 0.05699145 0.01573393 0.90122402 2
## 7 1991 0.074751769 0.036199759 0.05861704 1.93898501 0.17654687 2
## 8 1992 0.002920803 -0.038624163 0.06427831 0.07030462 0.79315789 2
## 9 1993 0.121958251 0.088187415 0.07163635 3.61134824 0.06852931 2
## 10 1994 0.209359433 0.178950180 0.06196352 6.88472800 0.01435708 2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## # BIC <dbl>, deviance <dbl>, df.residual <int>
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal2 <- tmpsal %>% group_by(yearID) %>% do(tidy(lm(wpct ~ stdsal,data=.)))
reg_avgsal2 <- reg_avgsal2 %>% filter(term=="stdsal")
reg_avgsal2
## # A tibble: 32 x 6
## # Groups: yearID [32]
## yearID term estimate std.error statistic p.value
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1985 stdsal 0.016187259 0.01553044 1.0422921 0.30766239
## 2 1986 stdsal 0.017477970 0.01240660 1.4087635 0.17172974
## 3 1987 stdsal 0.002037932 0.01236027 0.1648777 0.87042113
## 4 1988 stdsal 0.020264927 0.01457763 1.3901386 0.17724633
## 5 1989 stdsal 0.025756764 0.01129507 2.2803555 0.03176086
## 6 1990 stdsal 0.001429744 0.01139829 0.1254350 0.90122402
## 7 1991 stdsal 0.016324546 0.01172341 1.3924744 0.17654687
## 8 1992 stdsal -0.003408681 0.01285566 -0.2651502 0.79315789
## 9 1993 stdsal 0.026199094 0.01378642 1.9003548 0.06852931
## 10 1994 stdsal 0.031289427 0.01192489 2.6238765 0.01435708
## # ... with 22 more rows
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal3a <- tmpsal %>% group_by(yearID) %>% do(tidy(lm(Rank ~ stdsal,data=.)))
reg_avgsal3a <- reg_avgsal3a %>% filter(term=="stdsal")
reg_avgsal3a
## # A tibble: 32 x 6
## # Groups: yearID [32]
## yearID term estimate std.error statistic p.value
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1985 stdsal -0.4248347 0.3808463 -1.1155016 0.27568085
## 2 1986 stdsal -0.4452098 0.3833190 -1.1614601 0.25687641
## 3 1987 stdsal -0.1981250 0.3754045 -0.5277639 0.60250898
## 4 1988 stdsal -0.4917181 0.3809434 -1.2907906 0.20907270
## 5 1989 stdsal -0.8780452 0.3496219 -2.5114134 0.01916067
## 6 1990 stdsal -0.1544620 0.3850151 -0.4011842 0.69183414
## 7 1991 stdsal -0.1040378 0.3975860 -0.2616738 0.79580564
## 8 1992 stdsal 0.1019285 0.3813981 0.2672496 0.79156015
## 9 1993 stdsal -0.7841809 0.3644724 -2.1515507 0.04089753
## 10 1994 stdsal -0.4614412 0.2513726 -1.8356863 0.07786736
## # ... with 22 more rows
tmpsal <- salteam3 %>% mutate(stdsal = (avgsal - muavgy)/sdavgy)
reg_avgsal3b <- tmpsal %>% group_by(yearID) %>% do(glance(lm(Rank ~ stdsal,data=.)))
reg_avgsal3b
## # A tibble: 32 x 12
## # Groups: yearID [32]
## yearID r.squared adj.r.squared sigma statistic p.value df
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1985 0.049291985 0.009679151 1.904232 1.24434380 0.27568085 2
## 2 1986 0.053216699 0.013767395 1.916595 1.34898955 0.25687641 2
## 3 1987 0.011472470 -0.029716177 1.877023 0.27853476 0.60250898 2
## 4 1988 0.064915887 0.025954049 1.904717 1.66614027 0.20907270 2
## 5 1989 0.208108890 0.175113427 1.748110 6.30719715 0.01916067 2
## 6 1990 0.006661526 -0.034727577 1.925076 0.16094879 0.69183414 2
## 7 1991 0.002844932 -0.038703196 1.987930 0.06847316 0.79580564 2
## 8 1992 0.002967101 -0.038575936 1.906991 0.07142234 0.79156015 2
## 9 1993 0.151136009 0.118487394 1.893854 4.62917061 0.04089753 2
## 10 1994 0.114735226 0.080686581 1.306170 3.36974424 0.07786736 2
## # ... with 22 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
## # BIC <dbl>, deviance <dbl>, df.residual <int>
tot1
avg1
2
3a
3b
Predict awards
Conclusion